Smoothing issues in the structured language model
نویسندگان
چکیده
The Structured Language Model (SLM) recently introduced by Chelba and Jelinek is a powerful general formalism for exploiting syntactic dependencies in a left-to-right language model for applications such as speech and handwriting recognition, spelling correction, machine translation, etc. Unlike traditional N-gram models, optimal smoothing techniques – discounting methods and hierarchical structures for back-off – are still being developed for the SLM. In the SLM, the statistical dependencies of a word on immediately preceding words, preceding syntactic heads, non-terminal labels, etc., are parameterized as overlapping N-gram dependencies. Statistical dependencies in the parser and tagger used by the SLM also have N-gram like structure. Deleted interpolation has been used to combine these N-gram like models. We demonstrate on two different corpora – WSJ and Switchboard – that more recent modified back-off strategies and nonlinear interpolation methods considerably lower the perplexity of the SLM. Improvement in word error rate is also demonstrated on the Switchboard corpus.
منابع مشابه
Use of Two Smoothing Parameters in Penalized Spline Estimator for Bi-variate Predictor Non-parametric Regression Model
Penalized spline criteria involve the function of goodness of fit and penalty, which in the penalty function contains smoothing parameters. It serves to control the smoothness of the curve that works simultaneously with point knots and spline degree. The regression function with two predictors in the non-parametric model will have two different non-parametric regression functions. Therefore, we...
متن کاملIranian EFL Teachers’ Cultural Identity in the Course of their Profession
Grounded on Hofstede's (1986) dichotomous model of collectivism/individualism, this study explored Iranian English as a foreign language (EFL) teachers' cultural identity. A sequential mixed methods procedure was adopted to examine their cultural orientation and the impact of length of experience on their degree of propensity to absorb the target language culture. A total of 120 female and male...
متن کاملSmoothing Techniques for Tree-k-Grammar-Based Natural Language Modeling
In a previous work, a new probabilistic context-free grammar (PCFG) model for natural language parsing derived from a tree bank corpus has been introduced. The model estimates the probabilities according to a generalized k-grammar scheme for trees. It allows for faster parsing, decreases considerably the perplexity of the test samples and tends to give more structured and refined parses. Howeve...
متن کاملInvestigating the Relationship between Teaching Styles and Emotional Intelligence among Iranian English Instructors
This study investigated the relationship between five teaching styles and emotional intelligence among 102 Iranian English instructors from different universities in Tehran, Iran. To this end, the data were obtained through two phases of quantitative and qualitative data collection. To achieve quantitative data, the participants were asked to fill in two questionnaires, including the Teaching S...
متن کاملEstimating structural relevance of XML elements through language model
Language modeling approaches have been extensively used as an effective way of measuring ad-hoc document content relevance. However, in structured information retrieval (SIR) there is to our knowledge no approach which aims at assessing structural relevance using language models. In this paper we present a language model based on document-query structure likelihood. As the effectiveness of lang...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2001